Skip to content

[PoC] Implement some internal functions in plain PHP #18204

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

arnaud-lb
Copy link
Member

@arnaud-lb arnaud-lb commented Mar 31, 2025

This is a very quick PoC of @staabm's idea in #17536 (comment).

This PR implements array_find() and related functions in plain PHP, and removes the C implementation.

Idea

As seen in #17536, internal functions implemented in C are not necessarily faster than plain PHP ones. Therefore, it may be beneficial to implement internal functions in plain PHP code.

Aside from performance, this would have the following benefits:

  • Users can read the code of internal functions / IDEs can navigate to that code
  • Simpler implementation
  • Decrease maintenance cost
  • Less bugs
  • Reduce friction when changing the engine or internal APIs
  • Reduce internal recursion

Implementation

This PR is mostly a mean to demonstrate the concept and open the discussion, and not a proper implementation. For instance, I use opcache.preload to load array.php.

Although a proper implementation would use a similar mechanism.

The .php files could be either embed in the binary, or installed separately on the file system. Both have pros and cons.

Benchmarks

I'm using the same benchmark script as in #17536.

The plain PHP implementations is 1.08 times faster than master on my machine:

hyperfine '/tmp/base/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 /tmp/bench.php' '/tmp/internal-php/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 -dopcache.preload=ext/standard/array.php /tmp/bench.php'
Benchmark 1: /tmp/base/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 /tmp/bench.php
  Time (mean ± σ):     413.3 ms ±   1.7 ms    [User: 408.0 ms, System: 4.5 ms]
  Range (min … max):   411.3 ms … 416.1 ms    10 runs
 
Benchmark 2: /tmp/internal-php/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 -dopcache.preload=ext/standard/array.php /tmp/bench.php
  Time (mean ± σ):     382.0 ms ±   3.3 ms    [User: 376.7 ms, System: 4.5 ms]
  Range (min … max):   379.2 ms … 390.5 ms    10 runs
 
Summary
  /tmp/internal-php/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 -dopcache.preload=ext/standard/array.php /tmp/bench.php ran
    1.08 ± 0.01 times faster than /tmp/base/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 /tmp/bench.php

When JIT is enabled in both branches, this branch is 2.30 times faster than master:

hyperfine '/tmp/base/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 -dopcache.jit=tracing -dopcache.jit_buffer_size=100m /tmp/bench.php' '/tmp/internal-php/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 -dopcache.jit=tracing -dopcache.jit_buffer_size=100m -dopcache.preload=ext/standard/array.php /tmp/bench.php'
Benchmark 1: /tmp/base/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 -dopcache.jit=tracing -dopcache.jit_buffer_size=100m /tmp/bench.php
  Time (mean ± σ):     360.8 ms ±   2.0 ms    [User: 356.2 ms, System: 3.9 ms]
  Range (min … max):   357.9 ms … 363.3 ms    10 runs
 
Benchmark 2: /tmp/internal-php/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 -dopcache.jit=tracing -dopcache.jit_buffer_size=100m -dopcache.preload=ext/standard/array.php /tmp/bench.php
  Time (mean ± σ):     156.0 ms ±   0.8 ms    [User: 150.2 ms, System: 5.3 ms]
  Range (min … max):   154.7 ms … 157.8 ms    19 runs
 
Summary
  /tmp/internal-php/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 -dopcache.jit=tracing -dopcache.jit_buffer_size=100m -dopcache.preload=ext/standard/array.php /tmp/bench.php ran
    2.31 ± 0.02 times faster than /tmp/base/sapi/cli/php -n -dzend_extension=opcache.so -dopcache.enable_cli=1 -dopcache.jit=tracing -dopcache.jit_buffer_size=100m /tmp/bench.php

This is because JIT is able to inline the callback into array_find() in this benchmark, eliminating the function call overhead entirely, which is not possible with the C implementation. (JIT heuristics would probably blacklist some trace entry points after array_find() is used with many different callbacks, but this is something we can tune.) In general, plain PHP code gives more optimizations opportunities to the JIT.

Verified

This commit was signed with the committer’s verified signature. The key has expired.
arnaud-lb Arnaud Le Blanc
@iluuu1994
Copy link
Member

Interesting. I wonder if preload scales well, or whether at some point with enough code we'll have a noticeable startup delay due to code compilation. It may be possible to store them in the binary as compiled opcodes, with the file cache mechanism.

@nielsdos
Copy link
Member

I would be in favor of this.
I wonder how performance compares between native and php in the array_find test when the closure takes 2 args, on your machine (to avoid extra arg copy overhead).
Also note on Windows we don't have preloading.

@arnaud-lb
Copy link
Member Author

With a 2 args closure, native is faster (1.04 times).
With JIT this branch is 2.24 times faster.

@TimWolla
Copy link
Member

TimWolla commented Apr 1, 2025

Unsurprisingly, I'm also in favor of this for the maintenance aspect alone. However as a long-term goal I would find it important to be able to mix and match native and userland methods within a single class, similarly to Java’s native methods.

As an example, SensitiveParameterValue could be mostly implemented in userland code, but it currently has some internals-only behavior (get_properties_for) to hide the values in cases where __debugInfo() is insufficient, e.g. when casting to (array).


For the implementation I would therefore be in favor of putting the implementation right into the stub files and to indicate somehow if the function / method is a stub or whether it contains the implementation. That would also allow downstream consumers to use the stub-files as-is.

@bukka
Copy link
Member

bukka commented Apr 2, 2025

We have got issue with preloading and FPM using multiple pools with different user / groups and then using permissions for some operations so this would need to be properly tested. There is currently an ugly hack for single user but if there are multiple different users, then it might not work - I haven't tested it for some time but there were issues around it.

@bukka
Copy link
Member

bukka commented Apr 2, 2025

Well this setup has already issues with opcache (clearing locks), just wondering if basically required preloading could make it even worse.

@arnaud-lb
Copy link
Member Author

@bukka @nielsdos yes, I used opcache.preload for demonstration purposes here, but a proper implementation would not use opcache.preload or depend on Opcache. There are a few possibilities, such as compiling during startup (without changing uid) and moving op arrays to the native heap, or compiling at build time as @iluuu1994 suggested.

@Appla
Copy link
Contributor

Appla commented Apr 30, 2025

As an example, SensitiveParameterValue could be mostly implemented in userland code, but it currently has some internals-only behavior (get_properties_for) to hide the values in cases where __debugInfo() is insufficient, e.g. when casting to (array).

Is it feasible to integrate C and PHP through JITed FFI (#14491) to address these types of issues

@ilnytskyi
Copy link

As only php user without much low-level understanding of the php engine I see that this effect was possible due to "inlining"

because JIT is able to inline the callback into array_find()

because native C implementation would still use its C calls and introduce some overhead.

So to leverage this approach some condition must be met:

  • JIT enabled
  • Enough hot paths executed so the code gets jitted, but the php wouldn't know if this function can leverage JIT upfront?
    it's managed by those ini settings, right?
opcache.jit_hot_loop = 64
; After how many iterations a loop is considered hot. ...
opcache.jit_hot_func = 127
; After how many calls a function is considered hot ...

I tried to do some research and have a couple of follow-up questions:

  1. Is it feasible to add in php something like @ForceInline form JVM or [MethodImpl(MethodImplOptions.AggressiveInlining)] from C# ?

  2. Maybe we could write a userland code like belove?

#[JIT]
function array_find(array $array, callable $callback): mixed {
    foreach ($array as $key => $value) {
        #[JIT] // We don't know if callback can be inlined here
        if ($callback($value, $key)) {
            return $value;
        }
    }
}
  1. Would such an attribute need to annotate both the caller and the callback?
  2. Some callbacks due to their complexity, bindings, or dependencies can't be jitted at all?
#[JIT]
$cb = static function ($item) {
            return $item === 5000;
    }

I guess this may cause a question if we should start using this everywhere. Like this one: https://www.reddit.com/r/csharp/comments/8mw0w9/should_i_spam_all_my_methods_with/

On the other hand, if PHP ever supported a force-inlining hint, tools like static analyzers or IDEs could automatically recommend adding it to trivial methods and functions.

Frameworks could include it into their code like Laravel’s helpers https://laravel.com/docs/12.x/helpers

It could become as common as the “leading slash” global function import optimization (is_array → \is_array)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants